class: center, middle, inverse, title-slide .title[ #
Advances in Pass Rush and Tackling Evaluation in American Football with Player Tracking Data
] .subtitle[ ##
Quang Nguyen
] .institute[ ### Department of Statistics & Data Science
Carnegie Mellon University
JSM 2024 ] .date[ ###
@qntkhvn
  Â
qntkhvn   Â
qntkhvn.netlify.app ] --- <!-- layout: true --> <!-- <div class="my-footer"><span></span></div> --> --- # The challenges and the opportunities -- Defensive performance has been an understudied part of American football -- Challenges: * Previous metrics: box-score statistics or based on subjective judgment * Not all [INSERT STATS] are created equal * Positions like defensive linemen is sufficiently lacking recorded statistics -- Opportunities: tracking data provided by the NFL Big Data Bowl * 2023 theme: linemen on pass plays * 2024 theme: tackling --- count: false class: center, middle <p style="font-size:4em; color: #C41230"> <b> Pass rush evaluation </b> </p> --- # Team .pull-left[ <img src="data:image/png;base64,#broseph.jpg" width="85%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#teamstrain.jpg" width="75%" style="display: block; margin: auto;" /> ] --- # Previous metrics for pass rush evaluation * Play-level stats: sacks, hits, and hurries -- * Pass rush win rate * uses an arbitrary time thresold to define a pass rush win * converts continuous data to a win-loss indicator <!-- --- --> <!-- # How do we measure pass rusher effectiveness? --> <!-- * Distance to QB --> <!-- -- --> <!-- * Speed towards QB --> <!-- -- --> <!-- * How about both? --> <!-- -- --> <!-- Think simple... --> <!-- * An ideal statistic should increase as speed increases and distance decreases --> <!-- * Consider speed divided by distance --> --- # football meets materials science <img src="data:image/png;base64,#analogy2.png" width="110%" style="display: block; margin: auto;" /> --- count: false # football meets materials science <img src="data:image/png;base64,#analogy3.png" width="110%" style="display: block; margin: auto;" /> --- count: false # football meets materials science <img src="data:image/png;base64,#analogy4.png" width="110%" style="display: block; margin: auto;" /> --- # measuring pressure at the frame-level **Definition** `\(\text{(STRAIN}\)`; informal) -- For every pass rusher at each frame within a play, calculate -- * `\(d\)`: distance between pass rusher and QB -- * `\(v\)`: velocity at which pass rusher is moving toward QB -- * `\(\text{STRAIN} = \displaystyle- \frac{v}{d}\)` --- # Advantages of STRAIN * Simple, computation friendly: 2 features! #PutThatInYourXGBoost -- * Interpretable: 1/STRAIN <br> `\(\rightarrow\)` time required for a pass rusher to get to QB with current velocity & distance -- * Scalable: can be applied to every passing play (minus trick plays) -- * Continuous-time within-play metric -- * Properties: discrimination, predictability, face validity --- # data: nfl big data bowl 2023 Player tracking data for the first 8 weeks of the 2021 regular season Example play: Raiders vs Broncos (week 6) – T. Bridgewater sacked by M. Crosby <img src="data:image/png;base64,#track.png" width="80%" style="display: block; margin: auto;" /> --- # example play for Raiders DE <span style="color: #1143E2">Maxx Crosby</span> <img src="data:image/png;base64,#play_ini.gif" width="85%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#ex_play.gif" width="62%" style="display: block; margin: auto;" /> --- # STRAIN differentiates between positions (edge & interior) <img src="data:image/png;base64,#pos.png" width="70%" style="display: block; margin: auto;" /> --- # STRAIN differentiates between play outcomes <img src="data:image/png;base64,#outcomes.png" width="99%" style="display: block; margin: auto;" /> --- # STRAIN is more predictive of pressure than pressure itself <img src="data:image/png;base64,#predictability.png" width="100%" style="display: block; margin: auto;" /> --- # STRAIN is highly stable over time <img src="data:image/png;base64,#stability.png" width="60%" style="display: block; margin: auto;" /> --- # Multilevel model for STRAIN `$$\small \begin{aligned} \overline{\text{STRAIN}}_{ij} &\sim N(R_{j[i]} + B_{b[ij]} + D_{d[i]} + O_{o[i]} + \mathbf{x_{ij}} \boldsymbol{\beta}, \sigma^2); \text{ } i = 1, \dots, n \text{ plays} \\ R_{j} &\sim N(\mu_R, \sigma^2_R); \text{ } j = 1, ..., \text{# of rushers} \\ B_{b} &\sim N(\mu_B, \sigma^2_B); \text{ } b = 1, ..., \text{# of blockers} \\ D_{d} &\sim N(\mu_D, \sigma^2_D); \text{ } d = 1, ..., \text{# of defenses} \\ O_{o} &\sim N(\mu_O, \sigma^2_O); \text{ } o = 1, ..., \text{# of offenses} \end{aligned}$$` - Response: average STRAIN for pass rusher at the play-level - Random effects: pass rusher, pass blocker, defensive team, offensive team - Fixed effects: number of blockers involved in a play, position of blocker and rusher, play-context (down, yards to go, current yardline) --- # Pass rusher rankings obtained from resampling <img src="data:image/png;base64,#boot_rank.png" width="76%" style="display: block; margin: auto;" /> --- count: false class: center, middle <p style="font-size:4em; color: #C41230"> <b> Tackling evaluation </b> </p> --- # Team <img src="data:image/png;base64,#team.png" width="70%" style="display: block; margin: auto;" /> --- # EXISTING TACKLING METRICS ARE FLAWED * Tackles (solo or assisted) are unofficial stats * StatsBomb treats assisted tackles as 0.5 * PFF created "stops" * PFF tracks "missed tackles" -- These are only discrete counting stats! --- # What is the purpose of tackling? <img src="data:image/png;base64,#purpose.png" width="80%" style="display: block; margin: auto;" /> --- # Tackling evaluation framework * Purpose of tackling: to halt the **forward motion** of the ball carrier -- * Measuring forward motion: **velocity** toward the end zone -- Proposal: 3-step model-free framework, aiming to * Measure tackling contribution throughout a play * Assign defensive credit for halting ball carrier’s forward motion * Provide a continuous metric for defensive performance -- Data: First 9 weeks of the 2022 regular season; only consider RB run plays --- # example play for Giants RB Saquon Barkley <img src="data:image/png;base64,#barkley_run.gif" width="80%" style="display: block; margin: auto;" /> --- # STEP 1: identifying contact windows <img src="data:image/png;base64,#cw1.png" width="98%" style="display: block; margin: auto;" /> --- count: false # STEP 1: identifying contact windows <img src="data:image/png;base64,#cw2.png" width="98%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#barkley.gif" width="52%" style="display: block; margin: auto;" /> --- # Step 2: valuing a contact window <img src="data:image/png;base64,#value1.png" width="75%" style="display: block; margin: auto;" /> --- count: false # Step 2: valuing a contact window <img src="data:image/png;base64,#value2.png" width="75%" style="display: block; margin: auto;" /> --- # Step 3: crediting individual players <img src="data:image/png;base64,#credit.png" width="98%" style="display: block; margin: auto;" /> --- # Fractional tackles summary for example play <img src="data:image/png;base64,#ftsummary.png" width="75%" style="display: block; margin: auto;" /> --- # TACKLES and ASSISTS ARE OVERSTATED STATISTICS <img src="data:image/png;base64,#overstate.png" width="50%" style="display: block; margin: auto;" /> --- # Fractional tackles possess great stability <img src="data:image/png;base64,#grid.png" width="80%" style="display: block; margin: auto;" /> --- # Does momentum (size) matter? <img src="data:image/png;base64,#https://pbs.twimg.com/media/Djb2RYGX4AEeObf?format=jpg" width="44%" style="display: block; margin: auto;" /> --- count: false class: center, middle <p style="font-size:4em; color: #C41230"> <b> Challenges still exist </b> </p> --- # Lesson learned: let's take a step back <!-- <blockquote class="twitter-tweet"><p lang="en" dir="ltr">The more experience I've gained with player-tracking data, the more I realize the importance of simplicity. This is not statistical work - but rather focusing on the foundation: observed data.<br><br>Next steps: statistical work! What explains variation, impacts momentum changes, etc.</p>— Ron Yurko (@Stat_Ron) <a href="https://twitter.com/Stat_Ron/status/1744483411505414583?ref_src=twsrc%5Etfw">January 8, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> --> > The more experience I've gained with player-tracking data, the more I realize the importance of simplicity. This is not statistical work - but rather focusing on the foundation: observed data.<br><br>Next steps: statistical work!...<br><br>— Ron Yurko (@Stat_Ron) <a href="https://twitter.com/Stat_Ron/status/1744483411505414583?ref_src=twsrc%5Etfw">January 8, 2024</a> <!-- tracking data: rich and complex forget the model, what are the observables? what are we actually observing more work should be spent on this how are teams going to handle as we collect more and more data what are the "sufficient statistics" here --> --- # We still got work to do! Look at what baseball has done... * Statcast tracks exit velocity and launch angle <!-- smoothed measurements from technology --> * Now Statcast bat tracking is publicly available * Data available via Baseball Savant -- Can football (Next Gen Stats) do the same thing? -- This remains a challenge... --- # It's really that simple <img src="data:image/png;base64,#baseline.png" width="50%" style="display: block; margin: auto;" /> --- # Related links * Papers * STRAIN: [arxiv.org/pdf/2305.10262](https://arxiv.org/pdf/2305.10262) * Fractional tackles: [arxiv.org/pdf/2403.14769](https://arxiv.org/pdf/2403.14769) * Big Data Bowl submissions: [STRAIN](https://www.kaggle.com/code/statsinthewild/strain-sacks-tackles-rushing-aggression-index) & [Fractional tackles](https://www.kaggle.com/code/tindata/momentum-based-fractional-tackles) <p style="font-size:1.2em; color: #C41230"> <b> Carnegie Mellon Sports Analytics Conference (#CMSAC) </b></p> * November 1—2, 2024 * Register at [stat.cmu.edu/cmsac/conference](https://www.stat.cmu.edu/cmsac/conference) --- count: false class: center, middle <p style="font-size:4em; color: #C41230"> <b> Appendix </b> </p> --- # Valuing a contact window The value for a contact window is defined as `\(\displaystyle \frac{v_{\text{start}} - v_{\text{end}}}{v_{\text{pre}}} \,.\)` * If `\(v_{\text{pre}}\)` happens within window: replace `\(v_{\text{start}}\)` with `\(v_{\text{pre}}\)` * If `\(v_{\text{post}} \ge v_{\text{pre}}\)`: window has zero value * If `\(v_{\text{end}} \le v_{\text{post}} < v_{\text{pre}}\)`: replace `\(v_{\text{end}}\)` with `\(v_{\text{post}}\)` (fraction of unrecovered peak velocity) <!-- --- --> <!-- ```{r, echo=FALSE, message=FALSE} --> <!-- library(tidyverse) --> <!-- read_csv("~/Downloads/euro_squares.csv") |> --> <!-- ggplot(aes(low_score, high_score)) + --> <!-- geom_tile(aes(fill = I(colr)), show.legend = FALSE) + --> <!-- # geom_text(aes(label = n), color = "black", size = rel(3)) + --> <!-- scale_x_continuous(breaks = 0:5) + --> <!-- scale_y_continuous(breaks = 0:5) + --> <!-- coord_fixed() + --> <!-- # labs(x = "\nLower Score", --> <!-- # y = "Higher Score\n", --> <!-- # title = "UEFA EURO Squares") + --> <!-- theme_void() + --> <!-- theme(panel.grid = element_blank(), --> <!-- strip.text = element_blank()) + --> <!-- facet_wrap(~ year) --> <!-- ``` -->